b'U.S. DEPARTMENT OF COMMERCE\n          Office of Inspector General\n\n\n\n\n                PUBLIC\n\n               RELEASE\n\n\n\n            BUREAU OF THE CENSUS\n     Actions to Address the Impact on the\n      Accuracy and Coverage Evaluation\n          of Suspected Duplicate Persons\n            in the 2000 Decennial Census\n           Inspection Report No. OSE-13812/March 2001\n\n\n\n\n                           Office of Systems Evaluation\n\n\x0c\x0cU.S. Department of Commerce                                                      Final Inspection Report OSE-13812\n\nOffice of Inspector General                                                                              March 2001\n\n\n\n\nWe shared our concerns with senior bureau officials during meetings held in January and\nFebruary 2001. We recommended that they provide an analysis of the likely impact of their\nmethod for handling the reinstated person records. We also recommended that they ensure the\nimpact was considered in the bureau\xe2\x80\x99s process for reviewing the A.C.E. and census results for\nmaking a recommendation to the Secretary of Commerce about whether or not the census counts\nshould be statistically adjusted for redistricting. The bureau was responsive to our\nrecommendations. It has been preparing a chapter for the report supporting its recommendation\nregarding whether or not to adjust entitled, Accuracy and Coverage Evaluation Survey: Effect of\nExcluding \xe2\x80\x9cLate Census Adds,\xe2\x80\x9d and has already discussed the issue in the reports and analyses\nsupporting its recommendation. Our report documents the issues and concerns discussed in\nthose meetings and actions agreed to or taken by the bureau.\n\n\nBACKGROUND\n\nThe Constitution mandates that a census of the nation\xe2\x80\x99s population be taken every 10 years for\nthe purpose of congressional apportionment. Census data is also used for state redistricting and\nthe allocation of federal funds. Data from the decennial census provides official, uniform\ninformation about the nation\xe2\x80\x99s people and their social, demographic, and economic\ncharacteristics.\n\nIn counting the population, the bureau relies on its Master Address File (MAF) to identify where\npeople reside. The address file built from the MAF for use in the census is termed the decennial\nMAF. The quality of addresses maintained by the MAF directly affects the accuracy,\ncompleteness, and cost of the decennial. To overcome the historic undercoverage of housing\nunits for the 2000 census, the bureau devised an inclusive approach for retaining addresses in its\nfile and also used a wider variety of sources to obtain these addresses. In our September 2000\nreport on the MAF, we found that this inclusive approach resulted in an unknown number of\nduplicate addresses, which could cause duplicate enumerations. 3\n\nDuring a decennial census, the Census Bureau attempts to count and gather information about\nevery resident in the country. However, in any decennial, some enumerations that should have\nbeen excluded are included, and some portion of the population that should have been included is\nmissed. The first source of error leads to an overcount; the second, to an undercount. Every\ncensus for which the effect of these errors has been systematically measured has shown a net\nundercount\xe2\x80\x94that is, the number of residents missed is greater than the number counted in error.\nStudies going back to 1940 also show a net differential undercount for minority population\ngroups, meaning that minorities are missed at a higher rate than the white population. For 2000,\nthe bureau planned to measure coverage and reduce the differential undercount through the\nA.C.E. 4 The A.C.E. is a statistical methodology based on an independent sample of the\n\n3\n    A Better Strategy is Needed for Managing the Nation\xe2\x80\x99s Master Address File.\n4\n \xe2\x80\x9cAccuracy and Coverage Evaluation; Statement of the Feasibility of Using Statistical Methods to Improve the\nAccuracy of the Census 2000,\xe2\x80\x9d Federal Register, Department of Commerce, Bureau of the Census, June 20, 2000.\n\n\n                                                          2\n\n\x0cU.S. Department of Commerce                                               Final Inspection Report OSE-13812\nOffice of Inspector General                                                                       March 2001\n\npopulation, which is then compared or matched with the census records to determine persons\nmissed and erroneous enumerations. The A.C.E. uses a process termed \xe2\x80\x9cdual system estimation\xe2\x80\x9d\nto estimate the net undercount of various demographic subgroups of the population (called post-\nstrata) and to calculate the \xe2\x80\x9ccoverage correction factors\xe2\x80\x9d which can be used to adjust the census\ncounts.\n\nIn designing and conducting the census and A.C.E., the bureau strives to balance two types of\nerror. One type, sampling error, occurs only in the A.C.E. Sampling error, which is quantified\nby sampling variance, occurs because a sample is used to represent a population. The other type,\nnonsampling error, occurs in both the census and the A.C.E. For nonsampling error, the bureau\nis particularly concerned with systematic errors or biases. The most serious source of bias in the\ncensus is coverage error, which results from people being missed or from erroneous\nenumerations (including duplicate enumerations). The most notable example of this is the net\nundercount, including the differential undercount. Bias can occur in the A.C.E. as a result of\nerrors in matching, errors in accounting for missing information, or other systematic errors in\ncollection or processing. Bias caused by systematically missing individuals in both the census\nand A.C.E. is termed correlation bias. 5\n\nThe issue of whether statistical sampling could be used in the census was brought before the\nUnited States Supreme Court, which ruled in January 1999 that such sampling could not be used\nfor congressional apportionment purposes. The Court did not prohibit the use of statistical\nsampling for other demographic purposes, including redistricting.\n\nThe Secretary of Commerce is required by law to report redistricting data to the states within one\nyear after the decennial census date. For the 2000 census, the Secretary must report such data to\nthe states by April 1, 2001. By the end of February, the bureau had completed its internal\nassessment of whether the A.C.E. should be used to adjust the census counts. An Executive\nSteering Committee for A.C.E. Policy (ESCAP), consisting of 12 senior career bureau officials,\nwas responsible for reviewing census and A.C.E. data and preparing a report for the bureau\ndirector recommending whether the adjusted or unadjusted census data should be used. The\nESCAP\xe2\x80\x99s March 1, 2001, report recommended that the unadjusted data be released as the Census\nBureau\xe2\x80\x99s official redistricting data because the information available at that time was insufficient\nto conclude that the adjusted census data would be more accurate. The Acting Director of the\nCensus Bureau concurred with the ESCAP recommendation, and on March 6, 2001, the\nSecretary decided that the unadjusted data would be used.\n\n\n\n\n5\n \xe2\x80\x9cAccuracy and Coverage Evaluation; Statement of the Feasibility of Using Statistical Methods to Improve the\nAccuracy of the Census 2000,\xe2\x80\x9d Federal Register, Department of Commerce, Bureau of the Census, June 20, 2000.\n\n\n                                                      3\n\n\x0cU.S. Department of Commerce                                                 Final Inspection Report OSE-13812\nOffice of Inspector General                                                                         March 2001\n\nOBJECTIVES, SCOPE, AND METHODOLOGY\n\nThe objective of this evaluation was to determine whether the bureau\xe2\x80\x99s methodology for\nhandling the reinstatement of 2.4 million person records into the decennial census minimized the\nimpact on the accuracy and reliability of the A.C.E. This issue arose during our review of the\nprocess used in the A.C.E. for automated person matching.\n\nDuring January and February 2001, we briefed six members of the ESCAP regarding our\nconcerns and our recommendations for addressing them. Members whom we briefed included\nthe Associate Director for Decennial Census, who was the ESCAP chair, and the Assistant\nDirector for Decennial Census. Because these officials have already acknowledged our concerns\nand agreed with our recommendations, we are issuing this report in final. This report documents\nthe issues and the bureau\xe2\x80\x99s approach to addressing them. Bureau officials were given the\nopportunity to review the report for technical accuracy. Where appropriate, their comments have\nbeen incorporated.\n\nWe conducted our fieldwork between November 2000 and March 2001. We did not evaluate the\nbureau\xe2\x80\x99s process for removing duplicates or thoroughly review the analysis regarding these\nissues in the documentation supporting the bureau\xe2\x80\x99s recommendation.\n\nThis evaluation was conducted in accordance with the Quality Standards for Inspections issued\nby the President\xe2\x80\x99s Council on Integrity and Efficiency and was performed under the authority of\nthe Inspector General Act of 1978, as amended, and Department Organization Order 10-13,\ndated May 22, 1980, as amended.\n\n\nOBSERVATIONS AND CONCLUSIONS\n\nThe Bureau Addressed Our Concerns About\nthe Effect of Reinstatements on the A.C.E.\n\nBecause of its concerns about address duplication, the bureau monitored the number of housing\nunit addresses by comparing demographic benchmarks to the decennial MAF at several stages of\nthe decennial. By June 2000, some counties continued to show higher than expected coverage\nwhen compared to the benchmarks. The bureau decided to conduct fieldwork in some of these\ncounties to investigate the problem. Using the April 2000 version of the decennial MAF, bureau\nstaff examined approximately 20,000 addresses and found that over 13 percent were either\nduplicated (11.6 percent) or none xistent (1.5 percent). The bureau was sufficiently concerned\nabout address duplication and the resulting potential duplicate enumerations that in July 2000 it\nbegan devising possible methods for identifying duplicates. 6\n\nSpecial bureau operations involving address and person matching were developed and\nimplemented for the purpose of identifying and removing duplicate enumerations that remained\non the decennial file at the end of the census. The process was termed the Duplicate Housing\nUnit Operations. Its design was not specified before the census, but rather was developed during\n6\n    Overview of the Duplicate Housing Unit Operations, Fay Nash, U.S. Census Bureau, November 7, 2000.\n\n                                                        4\n\n\x0cU.S. Department of Commerce                                                   Final Inspection Report OSE-13812\nOffice of Inspector General                                                                           March 2001\n\nthe census in response to the problem of duplication. The bureau acknowledged that this\noperation made mistakes of both exclusion and inclusion, but believed it was necessary to avoid\nseriously impairing the accuracy of the apportionment numbers. 7\n\nThrough the rules of this process, the bureau was able to identify 6 million persons in 2.4 million\nhousing units as potential duplicates. However, according to the bureau, there was not enough\ntime to resolve the status of suspected duplicates before the census file was needed to begin the\ntightly scheduled A.C.E. matching and follow- up operations. In consultation with the bureau\nofficial responsible for conducting the A.C.E., the bureau decided to provide to the A.C.E.\nperson matching operation a version of the census file that did not include the suspected\nduplicated person records. The bureau\xe2\x80\x99s rationale was that removing the suspected duplicates\nfrom the A.C.E. matching process was better than retaining them because the model used to\ncalculate the adjustment works better with more accurate A.C.E. sample and Census 2000\ncounts. In addition, the bureau had a process for dealing with late census data and felt that any\nreinstatements could be handled through this process. 8\n\nAfter analyzing the suspected duplicates, the bureau removed approximately 1.4 million housing\nunits and 3.6 million persons from the census and reinstated approximately 1 million housing\nunits and 2.4 million persons into the census. Our review of the rules that the bureau used tends\nto suggest that the reinstatements comprised primarily duplicate persons residing in\nnonduplicated housing units. 9 The bureau\xe2\x80\x99s goal was to delete only duplicate housing units, and\ntherefore duplicate persons in nonduplicated housing units were reinstated. The bureau worked\nunder the assumption that these duplicate persons were similar to those residing in housing units\nknown to be occupied but with unknown persons. In this instance, the bureau would have added\npersons to households occupied by unknown persons through a statistical process termed whole\nperson imputation. In the case of the reinstated persons residing in nonduplicated housing units,\nthe bureau reasoned that these persons served as the imputations. The bureau also noted that all\nof the reinstated person records represented less than 1 percent of the total population and that it\nexpected such records to be represented in the A.C.E. in about the same proportion.\n\nThe bureau\xe2\x80\x99s memorandum documenting its process for incorporating late census data into the\nA.C.E. evaluated four options for handling late additions and recommended the option that the\nbureau deemed to be the most comprehensive and believed would avoid giving the appearance of\ndata manipulation. 10 According to the bureau, this option would not cause the expected value of\n7\nReport of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy, U.S. Census Bureau,\nMarch 1, 2001, p. 12.\n8\n    Treatment of Late Census Data for Accuracy and Coverage Evaluation Estimation.\n\n9\n Specification for Reinstating Addresses Flagged as Deletes on the Hundred Percent Census Unedited File\n(HCUF), DSSD Census 2000 Procedures and Operations Memorandum Series #D-11, Memorandum for Susan\nMiskura, Chief Decennial Management Division, from Howard Hogan, Chief, Decennial Statistical Studies\nDivision, November 7, 2000, and Results of Reinstatement Rules for the Housing Unit Duplication Operations,\nMemorandum for Preston J. Waite, Assistant to the Associate Director for the Decennial Census, from Susan M.\nMiskura, Chief, Decennial Management Division, November 21, 2000, Attachment 2.\n10\n     Treatment of Late Census Data for Accuracy and Coverage Evaluation Estimation, p. 5.\n\n\n                                                          5\n\n\x0cU.S. Department of Commerce                                                         Final Inspection Report OSE-13812\nOffice of Inspector General                                                                                 March 2001\n\nthe results of dual system estimation to be biased if the reinstated person records met certain\nassumptions. 11 Important among these is that the match rate for correctly enumerated reinstated\nperson records is the same as for the rest of the A.C.E. 12\n\nBureau officials stated that they did not have enough time to test whether the reinstated person\nrecords conformed to the assumptions before they had to make the recommendation on whether\nor not to adjust the census data; consequently, they could not demonstrate that their approach\ncaused little or no bias in the dual system estimates. The bureau acknowledged that the\nreinstatements would contribute to variance in the estimates, thereby increasing the risk of\nobscuring whether the A.C.E. estimates are more accurate than the census counts. Bureau\nofficials told us that although the specific variance caused by the reinstatements would not be\nknown by the ESCAP at the time it had to make its recommendation on whether or not to adjust,\nthis additional variance would be reflected in the total variance, an indicator that would be\nknown and that the ESCAP would consider in its deliberations.\n\nBecause the bureau did not plan to analyze the reinstated person records in any depth before the\nrecommendation on whether or not to adjust and did not have previous analysis documenting the\nlikely characteristics of such records, we became concerned about the bureau\xe2\x80\x99s ability to defend\nits assumptions about the effects of the reinstatements on the recommendation. Our concerns\nwere heightened because, as noted previously, our review of the rules that the bureau used tends\nto suggest that the reinstatements comprised primarily duplicate persons residing in\nnonduplicated housing units and because bureau officials told us that they suspected the\nreinstatements were geographically clustered. 13 Because of these issues, we recommended that\nthe bureau provide written analysis to support its position that the reinstatements would\nintroduce little or no bias into the dual system estimate.\n\nA related concern that we also raised was the number and impact of whole person imputations in\nthe census. This concern is related because these imputations are treated in the same way as the\nreinstatements in the dual system estimate. At the end of the census, the bureau reported that\nthere had been approximately 5.7 million people added to the census count by imputation, more\nthan 2.5 times the number added in 1990. The imputations, along with the 2.4 million\nreinstatements, totaled 8.1 million persons, about 2.9 percent of the population count. Given the\nimportance of an accurate census, we emphasized to senior census officials that the bureau\nshould be able to demonstrate whether the reinstatements, along with the imputations,\nexacerbated overcoverage or masked undercoverage.\n\nThe Associate Director for Decennial Census agreed to ensure that the ESCAP would consider\nour concerns in making its recommendation. In addition, the bureau official responsible for the\nA.C.E., also an ESCAP member, agreed to conduct sensitivity testing of the assumptions and\n\n11\n     Expected value is an average of estimates derived from all possible samples.\n12\n Accuracy and Coverage Evaluation: Data and Analysis to Inform the ESCAP Report, Howard Hogan, U.S. Census\nBureau, March 1, 2001, p. 52.\n13\n  According to the bureau\xe2\x80\x99s demographic analysis, erroneous addresses occurred with greater frequency in certain\ncounties.\n\n\n                                                            6\n\n\x0cU.S. Department of Commerce                                                      Final Inspection Report OSE-13812\nOffice of Inspector General                                                                              March 2001\n\ndocument the results. This analysis has been completed and is undergoing internal review by the\nbureau. It will be published as a chapter of the report supporting the ESCAP\xe2\x80\x99s recommendation\nentitled, Accuracy and Coverage Evaluation Survey: Effect of Excluding \xe2\x80\x9cLate Census Adds.\xe2\x80\x9d\n\nImportantly, the issues that we raised pertaining to the reinstatements and imputations were\nconsidered in evaluation reports supporting the recommendation of the ESCAP, as well in the\nESCAP report itself. The principal analysis and discussions are presented in two reports, Report\nof the Executive Steering Committee for Accuracy and Coverage Evaluation Policy and\nAccuracy and Coverage Evaluation: Data and Analysis to Inform the ESCAP Report.\n\nAccording to these reports, the ESCAP reviewed the evaluation report data, as well as other\ninformation, and concluded that the key assumptions underlying the methodology for including\nthe reinstated person records in the A.C.E., such as match ratios for the correct enumerations,\ncould be expected to hold, although they would not hold perfectly. The ESCAP believed that the\nmeasures available for assessing the effects of sampling variance and correlation bias would\ninclude the effects of the treatment of late additions and whole person imputations. 14 However,\nthe ESCAP was concerned that geographic clustering of the reinstated person records and\nimputations might have increased another type of error and further reviewed this effect.15 The\nESCAP concluded that the data did indicate some degree of geographic clustering within post-\nstrata and noted that it took these find ings into consideration when reviewing the results of the\nadjustments.\n\nAlthough we have not had the opportunity to thoroughly review the analysis supporting the\nbureau\xe2\x80\x99s decision regarding error added by reinstatements, we believe that the bureau\xe2\x80\x99s actio ns\nwere responsive to our concerns and recommendations in the short time frame available for\nreviewing the data and making the recommendation on whether or not to adjust. The bureau\nplans to perform further evaluation studies to assess the impact of the reinstatements and\nimputations.\n\nWe believe that such studies are appropriate to better understand both the impact on dual system\nestimation in the 2000 Decennial Census and the impact of similar late data requirements in\nfuture censuses and surveys since dual system estimation is an important bureau methodology\nfor measuring data quality. Further, to help avoid similar problems with the address file in the\nfuture, we reaffirm the recommendations presented in our MAF report regarding approaches for\naddressing housing unit overcoverage and undercoverage.\n\nShould you have any questions regarding this report, please contact me at (202) 482-4661 or\nJudith Gordon, Assistant Inspector General for Systems Evaluation, at (202) 482-5643. We\nwould like to thank Census Bureau headquarters staff for the cooperation and courtesies\nextended to us during our review.\n\ncc: Lee Price, Acting Under Secretary for Economic Affairs\n\n14\n     Report of the Executive Steering Committee for Accuracy and Coverage Evaluation Policy, p. 26.\n15\n  This type of error is referred to as synthetic error and is related to the distribution of the measured net undercount\nto local areas and demographic subgroups.\n\n                                                           7\n\n\x0c'